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Abstract — Evaluating the reliability of methods for reconstructing evolutionary trees is discussed under 
the four headings of: evaluating criteria for an optimal tree, finding the optimal tree for the criterion 
selected, detecting reliable and unreliable data, and estimating the error range for the final tree. It 
is shown with five data sets (protein sequences) that, in general, the minimal tree is a better estimate 
of phylogeny than a longer tree. However, for each data set, the minimal tree was no longer the 
shortest when the sequences were combined. An objective weighting of columns (characters) can lead 
to an improved tree by giving less weight to columns that are closer to a random order. The weighting 
of characters is derived from the ratio of the observed to expected number of incompatabilities for 
each column. Several forms of weighting give better trees as measured by both the increase in correlation 
between lengths of trees with different subsets of data, and by an increase in the similarity between 
minimal trees found with disjoint subsets of data. Increasing the size of randomly selected subsets, 
and measuring the increased similarity of the results, can lead to an estimate of the minimum number 
of trees that need to be considered as possibly the correct historical tree. A measure of the ‘treeness’ 
of the data is described that estimates the extent to which a binary tree is a good description of the data. 


Introduction 


Almost as soon as macromolecules were sequenced, predictions were made that the 
sequences contained information about evolutionary history (see Terzaghi et al., 1984). 
Such predictions are easy to make, but very difficult to evaluate quantitatively. Many 
methods are available for estimating evolutionary trees (Peacock, 1981; Felsenstein, 
1982; Cedergren et al., 1981) and although the results may have proved satisfactory 
in a qualitative manner (Terzaghi et al., 1984), there are still few quantitative studies. 

In an earlier study (Penny et al., 1982) we showed that evolutionary trees constructed 
from five different sequences were much more similar to each other than expected by 
chance. However, none of the minimal trees from the five proteins were identical, nor 
were any of them identical to the minimal trees found from the combined sequences. 
The result of trees being highly similar, but not identical, is expected in an evolutionary 
process with a strong stochastic element. Nevertheless, it leaves a major problem of 
just how close a minimal tree is to the historical evolutionary tree (T"). 

Most studies still: (a) have not evaluated the criterion for choosing the optimum tree; 
(b) do not guarantee to find the optimal tree for the selected criterion; (c) do not 
discriminate between reliable and unreliable data; and (d) are unable to estimate a 
possible error range. We will briefly consider each of these problems. 

Three main approaches have been used to test tree building methods and we will 
call these analytical, simulation, and empirical. Examples of the analytical approach include 
Felsenstein’s (1981) analysis of the conditions necessary for minimal length trees to 
converge on the correct tree and our own (Penny, 1982) on reasons why phenetic methods 
do not use all the information in the data. The main limitation of the approach is that 
it is not usually known whether the assumptions of a model are valid, for example 
whether there is a low probability of change on a line of descent (Felsenstein, 1981). 
For this reason we cannot make full use of the analytical approach until it is possible 
to test independently the assumptions for a given data set. 

Many workers have used simulation; to follow the spirit of the method we will 
randomly select three examples: the early work of Peacock and Boulter (1975), and 
the more recent work of Astolfi et al. (1981) and Tateno et al. (1982). However, since 
1975, we have not used simulation to generate data because we found that there were 


1985] TREE CONSTRUCTION METHODS 267 


large numbers of parameters that could be varied, and it was premature to test just 
one, or a few, sets of conditions. Again it is necessary to be able to test independently 
the assumptions on a data set before full use of simulation can be made. It is better 
at present to use real data and to determine what can be learned about the evolutionary 
process; this leads to the third approach. 

The empirical approach uses biological data and attempts to test hypotheses about 
the data. Most of the methods can be included under the general concept of congruence 
(Mickevich, 1978; Schuh and Polhemus, 1980); this has been used widely but the 
methods still need to be made more powerful. This paper is an application of the 
empirical approach being used to test more thoroughly methods and models of 
evolutionary tree construction. 


Materials and Methods 


The data is a set of 132 nucleotides derived from five proteins whose sequence was 
known for the same 11 mammals (Table 2 of Penny et al., 1982, but with the omission 
of column 85). We have used four methods to select subsets of columns. The first simply 
compares each protein against the other four sequences combined. For example, the 
lengths of trees derived from cytochrome c are compared with the lengths of the same 
trees using the combined sequences of the other four proteins. 

The other methods use randomly selected subsets. The second randomly partitions 
the data into two subsets of 66 columns (Schuh and Polhemus, 1980); each subset is 
knownzas a “halfling” or “Hobbit” and we call this “the method of Hobbits.” In one test 
the procedure was repeated on the halflings to give four disjoint subsets (quarterlings). 

The third method randomly selects columns without replacement. Subsets with 17, 
33, 66, and 122 columns were used; this is a form of resampling called “jacknifing” 
(see review by Efron, 1982). In the fourth method, “bootstrapping” (Efron, 1982), sets 
of 132 columns are selected with replacement. 


Results 


Maximum PARSIMONY — ARE SHORTER TREES BETTER? 


This is a test of the maximum parsimony criterion and congruence; a shorter tree 
with one set of data is expected to be shorter when tested against new data (Mickevich, 
1978). 

The following two-step procedure was used. (1) For each of the five sets of data, all 
minimal and near minimal trees were found by our branch and bound method (Hendy 
and Penny, 1982). In each case the upper limit was chosen so that no more than 16,000 
trees were selected [out of the (2n-5)!! = 34,459,425 possible unrooted binary trees]. 
This approach is only possible with a method such as a branch and bound method that 
can guarantee to have found all trees shorter than a specified length. With cytochrome 
c some sequences were identical, so fewer trees were selected in order that a similar 
proportion of the total trees was considered. The numbers of trees for each protein is 
given in Table 1. 

(2) The trees from each of the five sets of data in step (1) were then re-analyzed with 
the complementary data set. That is, for each minimal and near minimal tree from 
fibrinopeptide A, their lengths were calculated with the combined data from four other 
proteins. This was repeated for each of the original five proteins. 

The results of this analysis are given in Figure 1. The lengths of the trees for single 
proteins are plotted on the x-axis and the lengths of the same trees with the remaining 
data are plotted on the y-axis. The bars represent twice the standard error of the mean. 

Many statistical tests cannot be applied to these data because they depart dramatically 
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Table 1 


Number of minimal and near minimal trees for five sequences. 





Sequence Length of Number of trees (minimal plus 0,1,2,etc.) 
minimal tree 

0 1 2 3 4 5 6 7 8 9 
Hemoglobin a 89 1 5 26 143 400 950 1838 3529 6124 10431 
Hemoglobin 8 124 3 4 25 41 105 204 332 613 1030 1585 
Fibrinopeptide A 29 1 37 403 2724 12449 - - - - - 
Fibrinopeptide B 36 8 126 475 1313 4660 - - - - - 
Cytochrome c 16 6 55 195 321 368 - - - - - 





from randomly selected data from a normal population. However, they meet the 
requirements for nonparametric tests and Spearman coefficients of rank correlation Q 
were calculated for each data set. This coefficient was used because it appeared to handle 
adequately the large number of tied observations. The values obtained for @ were 
hemoglobin a 0.39, hemoglobin 8 0.41, cytochrome c 0.44, fibrinopeptide A 0.34, and 
fibrinopeptide B 0.37. The conclusion for all five proteins is that on average a shorter 
tree with one protein will also be shorter when additional data are used. To this extent 
maximum parsimony is a valid criterion for this data set. 

However, it is clear also from these results that the shortest tree for a single protein 
may not be the best tree when the data from all five proteins are used. The length of 
a tree with the combined data can be found by adding the values on both the x- and 
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Fig. 1. The lengths of minimal and near minimal trees for a given protein (x-axis) plotted against the 
length of the same trees with the other four proteins (y-axis). For example, trees with a length of 30 with 
fibrinopeptide A have an average length of just over 302 with the combined data from the other four sequences 
(cytochrome c, fibrinopeptide B, hemoglobins a and 8). The bars are twice the standard error of the mean. 
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y-axes for any particular tree in Figure 1. For example, with fibrinopeptide A the shortest 
tree is of length 29, with the other four proteins its length is 292, giving a combined 
length of 321 for the five proteins. 

But it is already known that the two shortest trees for the combined five proteins 
require only 308 changes (Penny et al., 1982). When we consider the 37 trees of length 
30 (one longer than minimal for fibrinopeptide A) we find that their lengths with the 
other four proteins vary from 284-332, giving a range for the combined five proteins 
from 316 to 332. With the combined data the average length of the trees of length 30 
(with fibrinopeptide A) is indeed longer than that for the tree of length 29, but some 
individual trees of length 30 are shorter on the combined data. Conversely, the shortest 
trees for the combined five proteins (two trees of length 308) had lengths 31 and 32 
with fibrinopeptide A. 

None of the minimal trees for individual proteins was also shortest with the combined 
data, and indeed none of the minimal trees for single proteins was in the group of 11 
trees that were of lengths 308 to 310. This finding highlights the need for both improving 
the criterion for the optimal tree, and the need for some estimate of the error. It is 
no longer acceptable just to find “the shortest tree” and hope for the best. 

At this point it can be claimed objectively that maximum parsimony is a valid criterion 
and that methods do exist for its calculation on small to medium-sized data sets (Hendy 
and Penny, 1982). These are the first two requirements outlined above. It cannot of 
course be claimed that maximum parsimony is the best criterion but the procedure does 
allow other criteria to be tested against maximum parsimony. 


GUARANTEE OF OPTIMALITY 


Only two points will be made here. The branch and bound method (Hendy and 
Penny, 1982), if it finishes a run, has found all trees shorter than the bound. And the 
method is very easily modified to allow the criterion to include objective character 
weighting. 


RELIABLE AND UNRELIABLE Data 


With any process with a strong stochastic element it is to be expected that there is 
“noise” in the data. In this study compatibility analysis has been combined with minimal 
length trees in order to estimate the noise, and to get more reliable trees. This part 
of the study was subdivided into three sub-goals: (1) quantify information from inter- 
nucleotide comparisons; (2) define strategies for weighting nucleotide positions (the null 
strategy of no weighting has been favored up to the present); and (3) develop methods 
for evaluating the effectiveness of both weighting strategies and tree building methods. 

1. Information from Inter-Nucleotide Comparisons — Compatibility. 

Compatibility analysis (Estabrook et al., 1976; Fitch, 1977; Guise et al., 1982; Hendy 
et al., 1978; Le Quesne, 1969; Sneath et al., 1975) allows a measure of the extent to 
which columns are consistent. For each column : we use the ratio of the observed to 
the expected number of incompatibilities (O;/E;) and test whether this ratio (O;/Ej) is 
able to detect information on the reliability of each column. The ratio O;/E; is plotted 
against the number of “duplicate changes” (parallel, convergent, or reverse changes) 
on the minimal trees. For each column, the proportion of the maximum possible number 
of duplications is plotted against this ratio, O;/E;. Figure 2 gives the results for the 
hemoglobin £ data of Penny et al., 1982). There is a strong positive correlation which 
demonstrates that the ratio O;/E; gives a good prediction of the reliability of each 
nucleotide position. 

To obtain the ratio O;/E;, all pairs of columns are compared. For each column, the 
observed (O;) number of incompatibilities are counted, and the expected number (E;) 
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Fig. 2. Plot of incompatibilities against duplications for the hemoglobin $ data. The x-axis is the ratio 
of observed to expected incompatibilities O,/E;. The y-axis is the proportion of the maximum number of 
duplicated character states (whether reversions, parallel, or convergent changes). 


is calculated on the assumption that there is no interaction between columns. The original 
method of Le Quesne (1969) is modified for characters with up to four discrete states. 
(More than one incompatibility is possible for many pairs of characters.) Fisher’s exact 
test (Sokal and Rohlf, 1981) is used to calculate the expected frequency of every possible 
combination. This test is usually used on a 2 x 2 contingency table but there are no 
difficulties in applying it to a R x C contingency table where (as with nucleotides) both 
R and C<5. 

This is the first sub-goal. A method, using the ratio Oj/Ej, is established for 
quantifying information from inter-nucleotide comparisons and this method is giving 
an indication of the expected reliability of each column. 

We have found the concept of treeness to be useful in indicating how much the data 
deviates from randomness. In this study we have defined the treeness of the data as 
(1 - O/E), where O and E are summed over all columns. A related concept has been 
used on distance matrices (Eigen and Winkler-Oswatitisch, 1981) and we have used 
complexity (Penny et al., 1980) which is only defined after the minimal tree is known. 
This molecular data has more randomness (treeness = 0.28) than, for example, the 
morphological data of the Berberidaceae (Meachem, 1980), which has a treeness value 
of 0.50. 
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2. The Second Subgoal is the Objective Weighting of Columns. 

With morphological data, several methods have been proposed to weight characters 
(Davies, 1981; Farris, 1969; Felsenstein, 1981; Hansell and Ewing, 1973; Hogeweg, 
1976; Whiffin, 1982). The most effective forms of weighting that we have tried so far 
are based on weighting each column as (1 - O,/E;). A weight of one means that a column 
is compatible with all columns. A value of zero means that the states of the column 
are randomly distributed when paired with other columns (although not necessarily 
with triples or higher groups). For some columns O;>E, and these show negative 
weightings; they must be given a value of zero for the branch and bound method to work. 

The weights (W; = maximum{[1 - O;/E;,0]) can be used directly but we have tried 
several modifications because the curve in Figure 2 is not linear through the origin. 
These use W; to different powers, logarithms, sines, and so forth. This is the second 
sub-goal, defining methods for objective weighting. 

3. Evaluating the Effectiveness of Weighting. 
3a. Correlation of Tree Lengths. 

For each of the 20 halflings the branch and bound method (Hendy and Penny, 1982) 
was used without weighting to find all minimal trees, together with all the near minimal 
trees (in this case trees that required no more than 5 extra nucleotide changes). For 
these 20 subsets there were a total of 48 minimal trees and an additional 3674 near 
minimal trees. For each halfling the total number of trees in the set varied from 51 
to 397. The lengths of these trees were recalculated with the data from the complementary 
subset. Thus, for each tree its length was known for two sets of data, the subset for 
which it is minimal (or near minimal), and for the complementary subset. 

Three examples are shown in Figure 3. The trees are minimal or near minimal for 
the first halfling. Their lengths are shown for this subset (with and without weighting), 
and for the complementary (disjoint) subset (again with and without weighting). The 
correlation coefficient (r) was calculated on the lengths of all trees in each subset. We 
would expect a positive correlation between the lengths of the trees with the disjoint 
subsets. The question is whether weighting gives a higher value for r. 
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Fig. 3. Three trees and their lengths with weighting (squares) and without (none). The lengths of trees 
are given for the original data (the data for which they are minimal) plus for the complementary halfling (new). 
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Without weighting there is a significant positive correlation but it is quite low (r = 0.30, 
Table 2). (It should be noted that these trees all came from the extreme low end of 
the range of tree lengths.) With several forms of weighting the correlations of lengths 
are significantly higher. It was found that r=0.54 with simple linear weights 
(W; = max[1 - O;/E;/,0]) but the highest value (r=0.59) was found with squares 
weighting W;?. 

The poor result found with the maximal clique (selecting the largest subset of positions 
that are all mutually compatible) was unexpected. A closer examination of this result 
indicates that in these data the maximal clique is inefficient in finding the innermost 
links in the tree, but better for selecting closely related taxa. It should be pointed out 
that the maximal cliques included only 17-25 of the 66 columns. (Note that this method 
is not comparing trees found from the maximal clique; the trees are minimal on all 
the data.) However, the maximal clique may not be the best method of using the 
information in the incompatibility matrix; the weights used in this paper appear to make 
better use of the information. 

The conclusion from this section is that weighting leads to closer agreement between 
the lengths of trees with disjoint (that is, having no data in common, non-overlapping) 
data sets; this occurred even when the trees has been found without weighting. 
3b. Comparison of Trees. 

Another method to evaluate the effectiveness of weighting is to compare minimal 
trees from different pairs of subsets. No weighting, linear, and squares weighting were 
selected for further study. The branch and bound method (Hendy and Penny, 1982) 
for finding minimal trees is readily modified to weight all nucleotide changes according 
to the weight of a particular column (W)). 

The partition metric (Robinson and Foulds, 1981) was used to compare minimal 
trees found with and without weighting. The partition metric is very easy to calculate 
on two (or more) trees (Penny and Hendy, 1985); its probability distribution is known 
for both binary and non-binary trees (Penny et al., 1982; Hendy et al., 1984), and 
it can be modified for several other applications (Penny and Hendy, 1985). For 11 taxa 
it takes even values between zero (trees identical) and 16 (2n-- 6, no similarities). In 
this case the test is to measure the similarity of minimal trees from the disjoint subsets; 
the question is whether minimal trees found with weighting are more similar than those 
found without weighting. 

Without weighting there was an average difference of 7.22 between the minimal trees 
from two halflings (66 columns) (Table 3). With 11 taxa, the probability (by logarithmic 
interpolation) of randomly obtaining trees this similar is 1.645 x 10 * (Penny et al., 


Table 2 
Correlation coefficients; average values and standard errors of the mean. Weightings used are: None, W; = 1; 
Max clique, W; = 1 if column is in the maximal clique, else = 0; Linear, W; = max (0,(1 - O/E)); Squares, 
Square root, and Logarithm are derived from linear; Sine, W; = (1 - costW;)/2. 


Weighting Average S.E.M. 


None 300 .028 
Max clique .005 .071 
Linear 2545 .038 
Squares 595 .047 
Square root -469 .038 
Sine 567 042 


Logarithm -527 -037 
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Table 3 


The average distance between minimal trees with different forms of weighting and for halflings (66 columns) 
or quarterlings (33 columns). The right hand column is the difference between minimal trees from the same 
data set (ignoring cases where there is only one minimal tree). 


Halflings Quarterlings Within 

Pairs None Linear Squares None Linear Squares Halflings 
1 2 7.60 5.00 6.00 7.83 6.00 8.00 5.00 
3 4 7.33 4.00 2.00 6.76 5.00 4.00 4.03 
5 6 5.50 6.00 4.00 8.29 8.00 6.00 4.00 
7 8 6.00 8.00 7.33 6.46 9.00 9.00 2.00 
9 10 6.00 4.00 2.00 8.08 7.66 5.00 3.33 
11 12 9.00 5.14 4.00 10.45 8.00 8.00 4.67 
13 14 7.00 3.00 4.00 8.15 9.00 9.00 6.00 
15 16 7.00 5.50 4.00 7.66 6.00 5.00 2.90 
17 18 7.25 4.00 4.00 8.17 9.67 10.00 6.00 
19 20 9.50 8.00 3.00 8.66 9.00 10.00 6.00 
mean 7.22 5.26 4.03 8.05 7.73 7.40 4.39 


trees 5672 697 183 13013 9400 6863 271 


1982; Hendy et al., 1984). By interpolation this corresponds to 5672 trees 
(1.645 x 10° * x 34,459,425). 

With linear weighting the average difference between minimal trees fróm comple- 
mentary sets of data was 5.26; this corresponds to 697 trees. We use the ratio of the 
size of these subsets as a measure of the improvement in the accuracy of a technique. 
We call the increase in this ratio the improvement in precision or sharpness of our estimate. 
We do not know the absolute value of the precision, rather we estimate the increase 
in precision from modifying the technique. In this case linear weighting has improved 
the precision of our estimate by at least 8 times (5672:697). 

With squares weighting the average difference between minimal trees was 4.03, which 
corresponds to 183 trees. The precision is thus increased 33 times (5672:183) over the 
unweighted method. Our conclusion is that objective weighting of characters has 
increased the precision of the maximum parsimony method, and squares weighting is 
the best method tested so far. We think that objective weighting increases the reliability 
because it uses more of the information in the data. It is interesting that the information 
being used to weight characters (comparisons between columns) is the information that 
is discarded by phenetic methods in forming a distance matrix (Penny, 1982). Weighting 
also reduces the number of trees that are minimal; with squares weighting there is seldom 
more than one minimal tree. 

Finally we point out that in previous work without weighting (Penny et al., 1982), 
minimal trees from the same data set were more similar than minimal trees from different 
data sets. The same is found here; without weighting there is an average difference 
of 4.39 between minimal trees from the same subset (right hand column in Table 3), 
compared with an average value of 7.22 between minimal trees from disjoint subsets. 
There is a major improvement when squares weighting is used; the average distance 
between minimal trees from disjoint subsets is now only 4.03 (Table 3). In this data 
set, and with squares weighting, minimal trees from different data sets are more similar 
than are minimal trees within a data set without weighting. (In this case the difference 
is not statistically significant between these last two values, but the gain from weighting 
is, and this is the main point.) 
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ESTIMATE OF THE POSSIBLE ERROR RANGE 


It is necessary to study the effect of using only half the data; our approach has been 
to randomly select subsets of different size. Minimal trees were found for all subsets 
without weighting and with squares weighting. The test was to measure the similarity 
of trees as more nucleotides were added. An aim was to detect whether the trees were 
converging on a small subset of trees (Felsenstein, 1983), and whether this occurred 
faster with weighting. Note that in this method two subsets may have columns in common 
and trees will appear more similar than when comparing disjoint subsets that have no 
columns in common. In this test squares weighting has been compared with no 
weighting. 

Minimal trees from subsets with 17, 33, 66, and 122 columns were compared; in 
each case minimal trees from squares weighting were more similar than without 
weighting (Fig. 4). As expected, the trees were more similar with larger subsets, and 
the differences between weighted and unweighted were increased. There is a large 
improvement in going from 33 to 66 nucleotide positions. The difference is reduced 
again with 122 columns, but this includes the effect that the two subsets have most 
of their columns the same. If the values are extrapolated to 132 columns, then the 
difference is about 1.1, which is equivalent to 8 trees. This is our lower bound on how 
many trees could be T*. We estimate that at least twice as many informative nucleotide 
positions are needed before we could be confident that the minimal tree is T*. 

The minimal tree with squares weighting and using all 132 columns is shown in 
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Fig. 4. Average similarity of trees from different sized subsets of the data. The values are from columns 
selected randomly and without replacement (two subsets may have some columns in common). Values with 


closed circles are without weighting, and those with open circles are for squares weighting, y axis extends 
from 0 to 5. 


1985] TREE CONSTRUCTION METHODS 275 


Figure 5. We would expect that this tree would be the best available estimate of T* 
but we require a quantitative estimate of the reliability of the final tree (Fig. 2). This 
tree is similar to the minimal trees without weighting (Penny et al., 1982), the main 
differences being that weighting has placed horse and pig in their expected positions, 
and has brought dog within the main group of eutherians. 

Near minimal trees have been found with the complete data set and using squares 
weighting. The closest tree to the minimal tree reverses the positions of rabbit and mouse 
and another tree just slightly longer has these taxa on a separate side branch. Clearly 
the exact relationship of these taxa is the least certain conclusion on the minimal tree 
and the three possibilities can be represented as a non-binary tree by joining rabbit, 
mouse, and primates to a single point. Although their exact relationship is uncertain, 
the close relationship of rodents and lagomorphs is one of the strongest conclusions from 
molecular studies (Penny and Hendy, 1985). 

The next least certain conclusion is the position of dog with respect to the primate, 
ungulate, or ancestral eutherian-marsupial lines. Any of these three positions are close 
to minimal and therefore there are at least 9 (3 x 3) trees that cannot be excluded from 
this data. We have also taken 20 bootstrap samples (sampling columns with replacement), 
found the minimal trees for each, and then determined the general consensus tree (Penny 
et al., 1982) for these minimal trees. This consensus tree is also within this group of 
9 trees. There is a difficulty at this point in that we (Penny and Hendy, 1985) are 
suspicious of some dog sequences and would not like to draw any further conclusions 
until they are checked. 

Biologists seem to seek “The One Tree” and are not satisfied by a range of options. 
However there is no logical difficulty with having a range of trees. To be able to reduce 
34,459,425 trees down to the order of 9 is analogous to an accuracy of measurement 
of about one part in 106. Many measurements in biology are only accurate to one or 
two significant figures and pale when compared to physical measurements that may 
be accurate to ten significant figures. To be able to estimate an accuracy of one tree 
in 106 reflects the increasing sophistication of tree reconstruction methods. (Note that 
on this argument, to identify an organism to a species is analogous to a measurement 
with an accuracy also of about one in 10°.) 


Discussion 


The problem of weighting has a long history in biology and for over 200 years biologists 
have debated its merits. With the development of computer methods and numerical 
taxonomy there has been an insistence that methods for determining relationships be 
made objective (Sokal and Sneath, 1963; Sneath and Sokal, 1973). As a consequence 
of this requirement for objectivity, the weighting of characters has generally fallen into 
disfavor. 

We conclude that the traditional approach was correct in maintaining that not all 
information was equally reliable for phylogeny, but that it is not possible to meet the 
objections of numerical taxonomists by producing an objective and testable method. 

The approach used in this study does require a lot of computer time. The present 
study, disregarding development time, has used over 150 hours of CPU time on a Prime 
750 computer. Most of this was taken in finding large numbers of minimal and near 
minimal trees for about 150 subsets of data, with and without various forms of weighting. 
However the branch and bound method (Hendy and Penny, 1982) with squares 
weighting runs 9-30 times faster (average 15) than with unweighted data. It required 
only 174 seconds to find the minimal tree with squares weighting with all 132 columns. 
All of the analyses are available on one large interactive program, although it is often 
more convenient to run the branch and bound algorithm for minimal trees as a batch job. 

We cannot claim to have found the best method(s) for weighting. Other weighting 
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Fig. 5. Minimal tree from full data (132 columns) and with squares weighting. The root of the tree has 
been placed between kangaroo and the eutherians. This is also the general consensus tree (Penny et al., 
1982) obtained from the minimal length trees for the bootstrap (Efron, 1982) samples. 


methods have been tried in this study, including probability and likelihood methods 
based on the distribution of pairs of character states, and also calculating the weights 
recursively be deleting the column with the most negative weight and recalculating the 
weights. The present method has proved the most powerful of those tested, probably 
because it allows the detection of nucleotide positions that deviate from random by having 
either more, or fewer, incompatibilities than expected by chance. 

We have not tested triple or higher groups of characters, some possible maximum 
likelihood methods (Felsenstein, 1981), identifying duplicated character states (Guise 
et al., 1982), other recursive optimization approaches (Farris, 1969), nor some possible 
discriminant functions. The general approach allows these other methods of weighting 
to be tested. The method can also be used to compare different approaches to tree 
building such as cladistics and phenetics. Do minimal length trees converge faster as 
more data is included, as is apparently the case (Sokal, 1983)? Questions such as this 
can be tested with all the data included, not just with half the data. Again, it can be 
tested whether trees converge faster with molecular data if the molecular clock is assumed. 

The reason that the weighting is giving better results is quite likely related to the 
calculations of Felsenstein (1983) that the minimal length tree is expected to converge 
on the true historical tree T* if the probability of change between nucleotides is low. 
If there is a high probability for nucleotide changes then the minimal tree will not 
necessarily converge on the true T* as more positions are added. The difficulty in using 
Felsenstein’s analysis has been that the probability of change of any character was not 
known, and almost certainly varies at different positions. With the present form of 
weighting, based on the observed and expected number of incompatibilities, we are 
almost certainly weighting more highly those nucleotides that have changed more slowly. 

Finally, this empirical approach to testing methods of tree construction makes the 
minimum number of assumptions about the data. It then allows a test of whether specific 
assumptions about the data, or about the mechanism of change, give more accurate trees. 
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